## Warning: NAs introduced by coercion
For this analysis on HappyDB, I wanted to focus on a personal curiosity - for peers within my age group (26-30), what brings them happiness? There have been myths that claim that girls mature or “reach adulthood” a few years earlier than boys.
Can we make some inferences from happy moments of millenials? How do males and females differ in this regard?
Overall, it appears that females place more happiness value in bonding than achievement, while men place achievement first.
Friends, day, and time all feature heavily for both sexes, but females have “husband” as the #4 word, while for males it’s “played”; Wife does not appear until #10
## Warning in mutate_impl(.data, dots): Unequal factor levels: coercing to
## character
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Selecting by tf_idf
The #1 bi-gram for men aged 26-30 is… video games! Additionally, “played video” and “played games” appears as well.
## Warning in mutate_impl(.data, dots): Unequal factor levels: coercing to
## character
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Selecting by tf_idf
But if we look at the bigrams from their relative importance to the document, the results are somewhat different. Promotion, living life, and dating girlfriend takes the lead for men, while husband surprise is #1 for women.
beta_spread <- hm_topics %>%
mutate(topic = paste0("topic", topic)) %>%
spread(topic, beta) %>%
filter(topic1 > .001 | topic2 > .001) %>%
mutate(log_ratio = log2(topic2 / topic1))
beta_spread <- beta_spread[order(beta_spread$log_ratio),]
beta_spread_a <- beta_spread[1:10,]
beta_spread_b <- beta_spread[258:267,]
beta_spread_fin <- rbind(beta_spread_a, beta_spread_b)
ggplot(data = beta_spread_fin, aes(y=beta_spread_fin$log_ratio, x=reorder(beta_spread_fin$term,-beta_spread_fin$log_ratio))) + geom_bar(stat='identity', position='dodge') +coord_flip()
While playing video games appears prominently among the happy moments of men aged 26-30, they draw similar happiness from promotions/achievements and moments of affection (dating their girlfriends or their wives giving birth)
For women, family influences happy moments heavily. Husbands, sons, and daughters come up more than I initially expected, whereas the word “boyfriend” occurs far less frequently